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Abstract 

We present a model for the diffusion of management fads and other 
technologies which lack clear objective evidence about their merits. The 
choices made by non-Bayesian adopters reflect both their own evaluations 
and the social influence of their peers. We show, both analytically and 
computationally, that the dynamics lead to outcomes that appear to be 
deterministic in spite of being governed by a stochastic process. In other 
words, when the objective evidence about a technology is weak, the evo- 
lution of this process quickly settles down to a fraction of adopters that is 
not predetermined. When the objective evidence is strong, the proportion 
of adopters is determined by the quality of the evidence and the adopters' 
competence. 
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1 Introduction 



In domains such as management and education, organizational practices often 
seem to come and go in puzzling ways. They are introduced with fanfare, 
but then they diffuse with little evidence that they work well. While they are 
sometimes discarded at later times, they are so with little conclusive evidence 
about their performance. Consider, for example, Quality Circles. Possibly as 
many as 90 percent of the Fortune 500 companies had adopted QCs by 1985, 
but by 1987 "more than 80% of the Fortune 500 companies that adopted QCs 
in the 1980s had abandoned them by 1987" |Abr96l pp. 256]. Yet one is hard 
pressed to find hard evidence of their impact, even after the fact. 

Fads are so common in American education that one observer reports that 
"School leaders in a whimsical mood sometimes play a parlor game called 'Spot 
That Jargon,' in which the goal is to name as many past educational fads as 
possible. The list is usually impressive." |Las98l IMC04| 

This paper examines the diffusion of such innovations or ideas. We call 
them soft technologies, not because of their physical properties but because 
evidence for or against them is equivocal, inconclusive, or even nonexistent. We 
contend that the choices made by adopters are quasi-rational: they reflect both 
an attempt to assess the imperfect data surrounding such innovations as well as 
a reliance on social cues, i.e., what peers have done. We argue that these two 
elements are linked by what could be called Festinger's Hypothesis: the more 
equivocal the evidence, the more people rely on social cues |Fes54l pp. 118]. 

In this paper we present a model that considers soft technologies as those for 
which (1) the objective evidence is weak and (2) people rely heavily on the prior 
choices of people in similar roles. We then show that the dynamics of the model 
leads to outcomes that appear to be deterministic in spite of being governed 
by a stochastic process. In other words, when the objective evidence for the 
adoption of a soft-technology is weak, any sample path of this process quickly 
settles down to a fraction of adopters that is not predetermined by the initial 
conditions: cx ante, every outcome is just as (un)likely as every other. In the 
case when the objective evidence is strong, the process settles down to a value 
that is determined by the quality of the evidence. In both cases the proportion 
of adopters of the technology never settles into either zero or one. 

1.1 Related Work 

In the most highly developed mathematical models of fads — economic theories 
of "herding" — decision makers also use social cues but do so in perfectly rational 
ways, via Bayesian updating. 1 Though we agree that social cues matter we think 
that the premise of Bayesianism exaggerates the rationality of agents facing the 
difficult decision of whether or not to adopt a soft technology. In particular, 

1 There is now a large literature on informational cascades triggered by fully rational agents: 
see the annotated bibliography of Bikhchandani, Hirshleifer and Welch, available on the Web 
BHWH6I. For seminal papers in this line of work see lBan92l lBHW92 Wcl92]. 
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there is little evidence for the claim that people are perfect Bayesians. A leading 
experimental economist summarizes the evidence as follows: 

Much research in cognitive psychology suggests that the way in 
which people form judgments of probability departs systematically 
from the laws of statistics and from Bayesian updating. (This should 
not be surprising, because there is no reason to think that evolution 
of brain processes like memory, language, perception, categorization, 
and reasoning would have adapted us to use a rule that Bayes only 
"discovered" a couple of hundred years ago.) Some research points 
toward systematic departures, or "biases" , which spring from a small 
number of "heuristics" , like anchoring, availability, and representa- 
tiveness |Cam98l pp. 171]. 

Thus, the theoretical value of herding models — the intriguing demonstration 
that what appears to be conformity behavior in the aggregate is consistent with 
perfectly rational action of individuals — should not be confused with empirical 
confirmation of its micro-postulates. As a purely theoretical point it is inter- 
esting to recognize that perfect information-processing by individual agents is, 
under certain circumstances, consistent with conformity-like behavior. But we 
suspect that to the extent that such models receive empirical support, the sup- 
port will be "weak" in the sense that the data on conformity or herding will also 
be consistent with a wide variety of other sensible-though-suboptimal forms of 
individual information-processing. Again, Camerer's assessment of Baycsianism 
is pertinent: 

As a descriptive theory, Bayesian updating is weakly grounded in 
the sense that there is little direct evidence for Bayesian updating 
which is not also consistent with much simpler theories. Most of the 
evidence in favor of Bayesian updating boils down to the fact that if 
new information favors hypothesis A over B, then the judged proba- 
bility of A, relative to B, rises when the information is incorporated. 
This kind of monotonicity is consistent with Bayesian updating but 
also with a very wide class of non-Bayesian rules (such as anchoring 
on a prior and adjusting probabilities up or down in light of the 
information) |Cam 98 pp. 171]. 

In this paper we propose a model that is consistent with all of Camerer's ob- 
servations and so is an alternative to canonical herding models. Thus our agents 
exhibit normatively desirable and empirically plausible monotonicity properties: 
in particular, the more the social cues favor innovation A over B, the more likely 
it is that an agent will select A, ceteris paribus. Yet the reasoning that underlies 
such choices is adaptively rational rather than fully rational. Moreover, unlike 
many adaptive models of fads, the present model generates analytical solutions, 
not just computational ones. 2 

2 Many — perhaps most — adaptive models of fads are what has come to be called "agent- 
based models" and it is virtually a defining feature of such models that they be computational. 
(For a survey of agent-based models, including several applied to fads, see IMW02I .1 
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2 The Model 



A world in which people can make mistakes (e.g., adopt an inferior method of 
instruction, partly because many other school districts have already done so) 
and where they are influenced by the possibly erroneous, possibly correct choices 
of similar decision makers is inherently probabilistic. Hence our model is built 
around a probabilistic choice process. 

We assume that two alternatives, A and B, diffuse through a population of 
decision makers. 3 In every period one decision maker makes up his mind about 
whether to adopt A or B; this choice is final. (In this sense the formulation is 
like most "contagion" models.) The diffusion continues until everyone in the 
population has selected either A or B. To get the process going, we assume 
that initially (t = 0) at least one person has made a choice, i.e., at least one 
person already champions either A or B. However, we allow for the possibility 
that either option may have multiple initial champions. (One could regard these 
early champions as the inventors of the two alternatives.) A useful benchmark 
case to keep in mind is a fair start, in which A and B are backed by the same 
number of initial champions. The numbers of A- and B-champions at t = will 
be denoted by mo and no- 

The heart of the model is how agents decide on which option to adopt. As 
noted earlier, we assume that there are two components to the adoption deci- 
sion. The first is based on individual judgment; the second, on social influence. 
Regarding the first component, we assume that a person isolated from social 
influence would choose the objectively superior option (labeled A in our model) 
with probability p. Thus p reflects the quality of the evidence about the relative 
merits of A versus B, plus whatever a priori bias (possibly due to a folk theory) 
exists. If A's superiority is obvious then p will be close to 1; if the two alterna- 
tives are nearly interchangeable or if evaluation technologies are primitive then 
p will be close to 1/2. If there is an a priori, theory-driven bias against A, then 
p could be less than 1/2. In general p is in (0, 1). 

We assume that the impact of social influence is linearly increasing in the 
proportion of the "converted" who have adopted in a particular adoption. Thus, 
if Mt denotes the number of people who by period t have chosen A and Nt 
denotes the number who have selected B, then the social pressure to choose 
A is simply M t /(M t + N t ). The social pressure to choose B is, of course, 
Nt/(M t +N t ).' 1 (We will use mo and no to denote the initial number of adherents 
to A and B, respectively.) 

Since we are trying to construct a simple benchmark model that has a clean 
structure, we assume that an agent's choice is simply a weighted average of the 

3 We shall often interpret A and B as competing innovations, but the model allows for 
different interpretations. For example, one of the options could be the population's status 
quo alternative while the other is an innovation. We will return to this specific interpretation 
in Section 151 

4 This is consistent with a simple search process: if the decision maker looks for social cues 
(i.e., the choices that the already-converted have made), then with probability Mt/(Mt + 
Nt) the first convert she bumps into is an A-adherent. And so with that probability she is 
persuaded to adopt A (conditional on her choosing via social imitation). 
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above components (individual judgement and social influence). Thus 

M t 

P [agent in period t + 1 chooses A] = a ■ p + ( 1 — a) (1) 

1 y ' M t + N t 

where < a < 1. Similarly, the probability that the chooser in period t + 1 
selects B is 

N t 

P[agent in period t + 1 chooses £?] = a ■ (1 — p) + (1 — a) — — — . (2) 

Festinger's hypothesis — that people are more open to social influence when 
evidence is equivocal — amounts here to assuming that a and p are positively 
correlated if p > 1/2 and negatively correlated if p < 1/2. We believe that this 
is a sensible proposition but we do not require it for our analytical results. We 
do use it in most of our simulations, however. Moreover, Festinger's hypothesis 
informs our understanding of what we consider soft technologies: in the context 
of Eq. a soft technology is one with a p in the vicinity of 1/2 and a low a. 

To get a feel for how this process works it is useful to consider first the two 
extreme cases: i.e., when a = 1 and a = 0. The former is simply an independent 
trials process with a probability of "heads" of p. This process and its properties 
are well-understood. The case of a = (pure social influence) is the standard 
Polya's urn process |PE23| . Given that we are particularly interested in soft 
technologies, which here are represented by low values of a, we will pause for a 
moment to recapitulate its features. 



3 Pure social influence (a = 0) 

Suppose 100 people have made up their minds, with 70 having chosen A and 
30 having chosen B. Then in the current period agent 101 has a 70% chance 
of selecting A. Consequently the expected proportion of the population who 
cleave to A is .7(^) + .3(^) = ^(.7 • 71 + .3 • 70) = ^(70.7) = .70; i.e., 
the expected proportion exactly equals the current proportion. It is easy to 
show that this martingale property holds in general: on average the pure social 
influence process stays exactly where it currently is. That is, 



E 



+k 



F t = 



Mt + Nt 



M t + N t 



= F t 



(3) 



for all k > 0. Hence the pure social influence process is strongly path dependent: 
in expectation it tends to stay wherever it is — i.e., wherever it has arrived via 
the particular sample path it has been traveling. 

Proposition 1. If m = n a = 1 (and a = 0), then P[M t = 1] = P[M t = 2] = 
• • • = P[M t = t + 1] = l/(t + 1), for all t > 0. 



Proof. This is a classical result [PE23 . 



□ 
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Figure 1: A simulation of the pure Polya's urn process for 1000 rounds. The 
initial condition is one A and one B. As can be seen from the figure, the three 
sample paths converge to different limits. 



Thus every feasible outcome is equally likely at every date: given a fair start, 
the pure process of linear social pressure is completely 'blind'. Therefore the 
process is just as likely to wind up generating a heterogeneous diffusion, with 
half the population championing A and the other half B, as it is to wind up 
with a sharply skewed outcome, with nearly everyone backing A. 5 

It is instructive to look at a typical sample path; see Fig. ^ Note that 
initially the relative frequencies of A- versus B-adherents swing wildly: with 
only a small number of initial converts, early adopters have a lot of influence. 
But once hundreds of people have taken sides subsequent adopters have little 
impact on relative frequencies, and so the process settles down. Hence it may 
appear to agents inside the process that the diffusion is moving toward some 
predetermined equilibrium. Part of this impression is correct: as t — > oo, any 
sample path of this process will settle down near some long-run proportion 
of A-adherents and B-adherents. But we know from Proposition ^ that this 
asymptotic state was not at all predetermined: ex ante, every outcome is just 
as (un) likely as every other. 

What does the pure Polya process tell us about the complex process, in 
which objective evidence does play a role? Recall that by Festinger's hypothesis, 
soft technologies have low a's. Moreover, an agent's adoption probability, as 
represented by Eq. JIJ, is continuous in a. Hence for a "close" to zero the 

5 It is worth mentioning that Proposition depends upon the initial seed being mo = 
no = 1. If it is a fair start but there are more two initial champions then the resulting 
distributions are not uniform, though they are symmetric around 1/2. Nevertheless, the 
martingale conditional expectations property does continue to hold, for any values of mo and 
no- So in this sense the strong path dependence property is insensitive to the initial conditions. 
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diffusion will approximate a pure social influence process: it will be "nearly" 
blind, ex ante. 

In such circumstances agents are influenced by a mixture of conformity and 
individual judgment about the evidence. Yet they may not have good access 
to the fine structure of their own choice processes WB98 . For one thing, they 
may not recognize how much of their choice is influenced by social cues. And 
even if they do, they may rationalize that part of it, along the lines of herding 
models: the behavior of peers conveys information, and so it is rational to be 
socially influenced. And of course that may be so. But if a is sufficiently low 
there is a good chance (less than half but still appreciable) that at date t a 
majority of the converts will back the weaker option; hence the next adopter 
could be led astray. 

Further, we suspect that people involved in a diffusion of a soft technology 
do not have good intuitions for the stochastic properties of the process. In 
particular, we suspect that the fact that with a low a a soft technology could, 
given a large population of adopters, wind up at many different outcomes with 
nearly equal ex ante probabilities is underappreciated. Life unfolds as a sample 
path, and the "settling down" feature of the particular sample path one inhabits 
(as in Fig. 1) will be much more salient than theoretical ex ante probabilities — if 
the latter are recognized at all. 

4 Properties of the Complex Process: a G (0, 1) 
4.1 The mean of the process 

We now directly investigate properties of the complex process. First let us 
examine how it behaves over time. 

Proposition 2. Assume a > 0. EF t — ► p monotonously as t — > oo. 

In fact, instead of expectations, we can establish a strong convergence result: 

Proposition 3. F t — > p a.s. as t — > oo. 

Thus the process exhibits a constant drift or bias toward p, the value of 
individual judgment. This reason behind this is because the martingale property 
is broken. In fact, instead of Eq. ©, it can be shown (see Appendix) that for 
the complex process a € (0, 1), 

E[F t+1 \F t ]=F t + (4) 

Note that the process could be mostly one of social construction — most of the 
weight is on imitation — yet improvement will tend to occur anyway, if m ^ no < 
p. Henceforth we denote — ^ — by f . 

Next let us consider how F t is affected by variations in the model's two basic 
parameters, a and p. The latter's effect is both obvious and unconditional. 
Clearly, the average fraction of correct choices, EF t , is increasing in p. But 
more can be said. 
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Figure 2: Simulations of the complex process after 1000 rounds. In both cases 
(a = 0.9 and 0.1) we choose p — 0.7 and initial condition one A and one B. In 
each figure three sample paths are shown. As can be seen, the convergence rate 
for small a is much slower than the rate for large a. 



Proposition 4. A higher value of p yields a distribution of F t that stochastically 
dominates a distribution of F t produced by a lower value of p, for allt>0. 

This immediately implies that EF t is increasing in p. 6 

It is worth emphasizing that this strong effect obtains even if the process 
is mostly one of social construction (low a's). Thus the presence of social 
construction does not utterly prevent "rational engineering" (e.g., concerning 
the technology of evaluation) from having benign effects. (Of course, in a sense 
this is built into the model, but hopefully it is built-in in a plausible way.) 

Further, Festinger's hypothesis implies that if p > 1/2, then as the evaluation 
technology improves, decision makers will rely more on the evidence and less 
on social cues: the rise in p will be followed by an increase in a. This indirect 
impact of changes of p might be just as important as its direct effect on EF t . 
This naturally raises the question, how do changes in a affect the process? 

Proposition 5. 

(i) If fo < P) then EF t is increasing in a, for all t > 0. 

(ii) If fo > P; then EF t is decreasing in a, for all t > 0. 

Thus, absent a lucky start (/o > p), less reliance on social cues improves 
matters. Consequently, if Festinger's hypothesis is correct, improvements in 
evaluation technology have two benign effects on diffusions that don't enjoy 
lucky starts: the obvious direct one (Proposition and a less obvious indirect 
one (Proposition via increases in a. 

4.2 The unpredictability of the process 

If we compare the two pure processes, it is evident that, given a standard start 
(7710 = no = 1), the pure social process is less predictable (as measured by 
the variance of Ft). When the diffusion is purely a matter of social construc- 
tion, every feasible outcome is equally likely; this is intuitively very random, 
and it translates into one with a high variance. In contrast, even the variant 
of the pure individualistic process with maximal variance (p = 1/2), extreme 
outcomes — nearly everyone choosing A or B, for example — are less likely than 
other outcomes. So the variance must be less than that produced by pure social 
imitation. 

Since the complex process is, at the level of individual choice, a weighted 
average of the two pure processes, it is intuitively reasonable that the variance 
of the complex process would be increasing in 1 — a, the weight on the higher 
variance component, given a fair start. This intuition is correct, but since it 
conditions on a particular starting point it is incomplete. The next result shows 
that more weight on the social component can reduce the variance of the complex 
process, at least for awhile. 

6 However, the converse does not hold: an increase in the mean fraction of correct choices 
does not imply that one distribution stochastically dominates the other. So Proposition 1^1 is 
stronger than just a comparison of means. 
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Proposition 6. d(var F t ) / da < for all t > iff either 

(i) fa <p and p = (1 - a)f + ap > 1/2, or 

(ii) fo>p and p = (1 - a)/ + < 1/2. 

Thus for parameters in this range, increases in the weight on evidence both 
improve community-wide accuracy (on average) and reduce the variance of out- 
comes. 

While it makes sense that the more socially constructed a diffusion is the 
more variable are its outcomes, it is important to emphasize that this conclusion 
does not always hold, as is indicated by the "only if" part of PropositionH3 Here 
is why. Suppose initially A-adherents are few and far between; for simplicity, let 
fo = 0. Then if diffusion were based only on imitation (a = 0), everyone would 
adopt the wrong option — and the process, although built completely on social 
cues, would exhibit no variability whatsoever. In such a case small increases in 
a would increase the variance of the outcomes, at least in the early goings. Of 
course this increase in variance is thoroughly benign: without it, the diffusion 
would be stuck in a highly predictable but consistently inferior sample path. 

Proposition 7. 

(i) d{vaxF t )/dp < for all t > if p > 1/2 and p > 1/2; 
(ii) d(varF t )/dp > for all t > if p < 1/2 and p Q < 1/2. 

Regarding part (ii), note that increases in p increase the variance in a so- 
cially desirable way. However, people who like predictability could resent the 
"rationalizing" effect of improvements in program information and evaluation 
technology, because in these circumstances increasing p makes things "messier" 
and less predictable. 

As a matter of interpretation, one might say that &p < 1/2 and an po < 1/2 
involves a "soft" technology with a vengeance: there is some kind of bias against 
the better technology, maybe because it is not as trendy, and there is also a bias 
in terms of the initial social proclivities (po < 1/2). In this case, improving 
evaluation technology also creates a "messier" and more confusing process ex 
ante, in that such improvements increase the variance of the outcomes. 

Indeed, even if p > 1/2 (i.e., the "standard" case), we know that in period 
1 the variance of F\ is increasing in p if po < 1/2. (And probably this will hold 
for some finite number of periods after period 1 too.) This is a very natural case 
of a soft technology: there is a social bias against the better technology (bad 
luck, essentially, or maybe glamour is on the side of the weaker technology), in 
that fo < 1/2, and things are noisy enough so that p, though above 1/2, is still 
fairly low. Hence if Festinger's hypothesis were to kick in, so that a were pretty 
low too, then po < 1/2, and the variance of F\ would be increasing in p. 

4.3 The effects of initial conditions 

There are two types of initial conditions: the size of the initial seed (mo + no) 
and the proportion (fo — ™? )■ Each has effects independently of the other 
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(i.e., holding the other fixed), so we will examine them separately and in a 
ceteris paribus manner. 

4.3.1 Varying the size of the initial seed 

Proposition 8. Suppose F( is a bigger process than F t in that m' = kmo and 
n' = kn , where k is an integer larger than one. In all other respects the two 
processes are identical. 

(i) EF[ < EF t for all t > iff f < p. 

(ii) EF{ > EF t for all t > iff f > p. 

Thus a bigger initial seed acts as an inertial anchor, slowing down the move- 
ment of E[F t ] to its attractor p. 

4.3.2 Varying the initial proportions 

It is obvious that, all else equal, increasing the initial bias toward A (i.e., in- 
creasing /o) boosts EF t at every date. Suppose, e.g., we start out with m + 1 
A-converts, instead of with just m . This increases the social conformity pres- 
sure for the agent in period 1 to adopt A. That in turn increases F\ , the expected 
fraction of the converts who adhere to A at the end of period 1, which in turn 
provides more social cues to the decision maker in period 2 to adopt A and so 
on. 

But a higher /o has an even stronger effect — stronger even than ordinary 
first-order stochastic dominance. To see what this sense is, consider the following 
definition. 

Definition 1. Suppose the support of F t and F[ can be divided into three mu- 
tually disjoint subsets: a non-empty set of high states {fh u t, fh 2 ,t, ■ ■ ■ , fh a ,t}> a 
non-empty set of low states {/i 1; t, fi 2 ,t, ■ ■ ■ , fl b ,t}, an d a possibly empty set of 
intermediate states {f mi ,t, fm 2 .t, ■ ■ ■ ,fm c .t}- Suppose the three subsets are con- 
nected in the sense that any low state is less than any intermediate state, and 
any intermediate state is less than any high state. We say that F[ is stochas- 
tically bigger than F t in a strong sense if P[F{ = fhj] > P[Ft — fh.t] for any 
high state h, P[F{ = f m ,t] — P[Ft = fm.t] for any intermediate state ra, and 
P[F( = fu] < P[F t = f it t] for any low state I. 

Thus a distribution of F t that is stochastically bigger in a strong sense puts 
strictly more weight on high fractions of converts who have chosen correctly, and 
strictly less weight on low fractions. (Clearly this implies first-order stochastic 
dominance, but the converse need not hold, so this is in fact a "strong sense" 
of stochastic dominance.) 

Proposition 9. All else equal, the distribution of F t is stochastically increasing, 
in a strong sense, in fo, for all t > 0. 
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Naturally, smaller / 's produce distributions of F t that are stochastically 
smaller, in a strong sense. Suppose, therefore, that A is an innovation that 
is superior to B. If B is not an innovation — it is in fact the status quo — and 
the community is quite traditional (everyone initially uses B, though people are 
willing to consider an innovation), then effectively fo equals zero. This is the 
toughest possible starting point for an objectively superior innovation. 

The effect of varying initial bias on the variance of the process is perhaps 
less intuitive. 

Proposition 10. 9(var F t )/df < for all t > if p > 1/2 and p > 1/2. 



4.4 The convergence rate 

If we wish to use our complex process to explain diffusion of innovations in the 
real world, only studying the limit behavior is not enough. If the characteristic 
time needed to reach the asymptote state is too long to be observed, then the 
asymptote state cannot be very meaningful in the practical sense. A rough 
estimation of the convergence rate of the complex process can be achieved from 
a mean-field approximation. Neglecting the noise term, the stochastic process 
F t can be approximated by Eq. J2J: 

Ft+1 = Ft + ^M. (5 ) 
+ t+s+1 w 

Further approximating F t by a continuous process F(t), we can write 
dF(t) p - F(t) p - F(t) 



a ~ a- 



(6) 



dt t + s + 1 t 

Solving for F(t), we find 

\F(t)-p\~t- a . (7) 

If we define a characteristic convergence time T to be the time it takes F(t) to 
converge to a vicinity within e from p, then we have 

T~(i) 4 . (8) 

Thus as a — > 0, the characteristic convergence time diverges exponentially in 
1/a. Indeed, when a = (Polya's urn) the process never converges to p. (To 
be precise, the Polya process can converge to any p £ [0, 1], but the probability 
that it converges to any particular p is zero.) 

The mean-field estimation is in principle only for the mean of F t , of course. 
For a fine estimation of the variance of Ft, we need a central limit theorem. 

Proposition 11. 

(i) If 1/2 < a < 1, then as t -> oo, y/i(F t - p) converges in distribution to 
a normal distribution with mean zero and variance p(l — p)/ (2a — 1). In 
particular, var F t = 0(l/t). 

(ii) If0<a< 1/2, then var F t — > slower than 0(l/t). 
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4.5 Implications regarding herding and massive confor- 
mity 

Assume, as in cascade models with rational agents, that the "initial seed" is 
exactly one person who chooses A with probability p and B with probability 
1 — p, and everyone else follows sequentially. 

Definition 2. We say that there is "herding" if after some period T everyone 
makes the same choice. 

Fact 1. Herding occurs in our model (with positive probability) iff one of the 
following condition holds: 

(a) a > and p — 1; then we get herding on option A. 

(b) a > and p — 0; then we get herding on option B. 

(c) a = 0; then we get herding on option A with probability p and herding on 
option B with the complementary probability 1 — p. 

Proof. Sufficiency. All three parts are trivial, by induction. 

Necessity. Suppose none of (a)-(c) hold. Then we know that a > and p € 
(0, 1). By Prop. |3J Ft — ► p almost surely. Hence both A and B are chosen 
infinitely often with probability one. □ 

Fact 2. Under the assumptions of our model, if heterogeneous behavior ever 
occurs then herding is impossible (with probability one). 

Proof. By Fact 1, if heterogeneous behavior has emerged by some date t > 
then the extreme conditions (a)-(c) in Fact 1 cannot hold. Hence we know that 
a > and p e (0, 1). The rest is the same as the necessity part of Fact 1. □ 

Conformity could be overwhelming — nearly everyone in a community winds 
up making the same choice — without being complete. One of the surprising 
results of the rational cascade models is that massive-conformity-in-the-making 
can be very fragile. As Bikhchandani, Hirshleifer and Welch put it, "A little 
bit of public information (or an unusual signal) can overturn a long-standing 
informational cascade. That is, even though a million people may have chosen 
one action, seemingly little information can induce the next million people to 
choose the opposite action. Fragility is an integral component of the informa- 
tional cascades theory!" f |BHW96] original emphasis). 

But this fragility is intimately linked to the agents' complete rationality and 
deep understanding of informational cascades. As Bikchandani et al. remark, in 
standard rational models "everyone knows that there is very little information 
in a cascade |BHW96) . That "everyone" pertains, of course, to the model's 
rational agents, not to real decision makers: the evidence is that the latter do 

7 This is the term some scholars (e.g., IH.irJ l'i : use. Others (e.g., IBHW961 ) call this an 
informational cascade. 
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Figure 3: A simulation of the complex process with a = 0.5 and initial seed 
one A and one B. p is set to 0.2 for the first 500 rounds and is updated to 0.3 
for another 500 rounds. The vertical line indicates the change point. As can be 
seen, the sample path is not significantly affected by the new level of p. 

not realize that there is very little information in a cascade. 8 The agents in 
our model are imperfectly rational and lack a deep understanding of cascades. 
Hence, diffusion processes in our model are not fragile (in the above sense). This 
is easily established by re- inspecting Eq. QJ: the probability that an agent makes 
the correct decision is continuous in p, so a little bit of new public information — 
represented as a sudden positive shock to p — will only increase the probability 
of choosing correctly by a little bit. (For an illustration of this, see Fig. [3]) 

5 Alternative Interpretations 

It is worth noting that one can use the model to represent situations in which 
there is a status quo option that everyone already uses. Then the diffusion is 
simply this: in every period one agent has an opportunity to either take up the 
innovation (say, A) or keep the status quo (say, B). Further, one could allow for 
the innovation to be objectively inferior to the status quo. 

Under this interpretation of a novel technology competing with a status quo, 
it is reasonable to suppose that /o equals zero. This is, from the point of view 
of an innovation, the toughest possible starting point. 

If it's a soft innovation then presumably both p and a are relatively low, 
so it is probably the case that ap + (1 — a) fa is less than one-half. Hence by 

8 This is the pattern that Kubler and Weiszacker found in their experiments on cascades 
IKW04I . As they put it, "players do not consider what their predecessors thought about their 
respective predecessors. Thus, they do not understand that some of the decisions they observe 
have been herding decisions, not based on any private information (pp. 438). 
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Proposition[fjl decreasing the reliance on social cues (i.e., a smaller 1 — a) would 
increase the variance of outcomes. 

6 Extensions of the Model 

6.1 Endogenizing Festinger's Hypothesis 

If Festinger's hypothesis is correct, then the weight on social cues, should depend 
on how difficult the choice is. That is, a should be a function of p. We can 
stipulate a priori several properties that this function should have. First, 1 — a 
should reach its maximum when p = 1/2: it is in this circumstance that an 
agent is maximally uncertain about the relative merits of A versus B and so, 
following Festinger, s/he would be maximally reliant on social cues. Second, as 
a benchmark the function should be symmetric around one half: /(.5 — k) = 
/(.5 + fc), for .5 < k < 1. 

A simple function with both properties is 1 — a = 4p(l — p). Then the 
probability that the chooser in period t would pick A would equal 4p(l — p) ■ 
F t + l-4p(l-p) -p. 

An interesting feature of this choice equation is that it creates the possibility 
that the probability of a correct choice is decreasing in p. This seems bizarre but 
it is explicable: it arises because a is a function of p. Note that 4p(l — p), the 
weight on social cues, is increasing when p is less than one half, since in this range 
increases in individual-level accuracy make the choice problem more confusing 
or troubling. Thus, as we suggested earlier, increases in p have two effects: 
a direct effect on individualistic choice (which is always benign, as shown by 
Proposition^ and an indirect one on the weights (which is not always benign). 
For certain parametric values the indirect effect is sufficiently large and negative 
so as to swamp the positive direct effect. 9 

Thus endogenizing Festinger's hypothesis will generate interesting hypothe- 
ses. 

6.2 Diffusion in Communities with Internal Structure 
6.2.1 Status Hierarchies 

There is considerable empirical support for the hypothesis that the higher the 
decision maker's status, the more impact his/her adoption choice has on the 
unconverted (e.g., [SS98I pp. 275]). It would be easy to incorporate this empir- 
ical regularity in a stark way in the current model: only high status agents are 
imitated; the adoption choices of low status agents are ignored. 

9 To see this concretely, note that the derivative of 4p(l — p) ■ fo + 1 — 4p(l — p) ■ p with 
respect to p equals 4(1 — 2p) ■ fo + (1 — 8p + 12p 2 ). Then suppose for example that fo = 
and p = .25. With these parametric values the derivative equals —.25, so small increases in p 
make the agent in period one more likely to err. However, if p is sufficiently big — specifically, 
if it exceeds max(.5,/o) — then increases in p do decrease the chance of mistakes. (When 
p > .5 then a higher p decreases the decision maker's confusion, thus raising the weight on 
individualistic judgment, which is benign since p > /q.) 
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An intriguing issue concerns the relative competence of high status and low 
status agents. In diffusions that tap into a relatively strong scientific or techni- 
cal base (pharmaceuticals, computer hardware, etc.), one would expect higher 
status agents to be more likely to make the right choice (higher p's). However, 
there may be circumstances in which status is purely a subjective phenomenon, 
lacking any objective correlate. (People inside the community may believe that 
higher status is correlated with more expertise but this would be an illusion.) 

6.2.2 Clustered Interaction 

Most diffusions occurs in communities that exhibit biased interactions. In gen- 
eral, executives in the same industry are more likely to interact with each other 
than with executives in a different industry. School superintendents in the same 
state are more likely to encounter each other than are superintendents in differ- 
ent states. (At the limit — if these subcommunities were completely sealed off 
from each other — then our model applies as it is to each subcommunity.) 10 

6.3 Leakage of Information 

Here, p would be a function of time. Typically we would expect that p would in- 
crease over time, as information about the technology (by the already-converted 
or by third parties) leaks out. As already noted, the baseline diffusion process 
(stationary p) is robust: small positive shocks to p will on average have small 
effects on the fraction of the population that adopts option A. 

6.4 Waves of Innovations 

Today it seems that the business world never rests. No sooner has one innovation 
passed from the scene — or at least from public attention — than another one 
appears. Becauase there are good reasons for this — it is not accidental |Abr96| — 
one can expect this pattern to continue indefinitely. 

Although waves of innovation will obviously produce some new patterns, 
we suspect that some of the present paper's results will continue to hold. In 
particular, we conjecture that if the waves are composed of soft technologies, 
with p around 1/2 and strong propensities to imitate, then the process will 
continue to be Polya-like in two senses. First, objectively uncertainty will be 
very great: many adoption patterns will be possible ex ante. Second, if the 
waves do not come too often then sample paths of diffusion will settle down. 
Hence, subjectively it will feel as if the process is moving toward a predetermined 
equilibrium — an illusion. 

10 For a study of the influence of social structure on opinion formation, see e.g. 
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7 Conclusion 



Because they diffuse with little objective information about their effects, soft 
technologies pose challenges for decision makers. Psychologists have long argued 
that when faced with such choice problems, people use a reasonable though im- 
perfect imitation heuristic. We have presented a mathematical model of diffu- 
sion that combines this heuristic with agents' efforts to make factually-grounded 
decisions and we established both analytically and computationally that such 
processes exhibit clear stochastic properties. We then showed that the dynam- 
ics of the model leads to outcomes that appear to be deterministic in spite of 
being governed by a stochastic process. In other words, when the objective 
evidence for the adoption of a soft-technology is weak, any sample path of this 
process quickly settles down to a fraction of adopters that is not predetermined 
by the initial conditions: ex ante, every outcome is just as (un)likely as every 
other. When the objective evidence is strong, the process settles down to a 
value that is determined by the joint effect of the quality of the evidence and 
the agents' competence. In neither case does the proportion of adopters settle 
into either zero or one: pure herding does not occur except in parametrically 
extreme situations. 

Further, unlike informational cascades generated by fully rational actors, the 
process of the present model is robust: diffusions that have for a long time tilted 
massively toward one option cannot be suddenly derailed by small infusions of 
new public information. The fragility of cascades generated by fully Bayesian 
agents is, we believe, an artifact of unrealistic assumptions of hyper-rationality. 
Diffusions may be initially volatile, as they are in the present model, but we 
believe that these processes stabilize once the weight of public opinion has been 
brought to bear. 11 

11 Of course, the introduction of a new innovation can, by restarting the process, desta- 
bilize it. But that is not what is producing fragility in the full-rationality cascades: these 
are not robust against small shocks associated with the pre-existing options. Moreover, as 
indicated earlier, we believe that the present model can be extended to accommodate waves 
of innovation. 
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Appendix 



Consider the following variation of Polya's urn. There are two types of alterna- 
tives, A and B. At time t — there are s agents who have already made their 
choices, of which mo have chosen A and no have chosen B. Denote the initial 
fraction of A-adherents to be /o = mo/s. Define the dynamics recursively as 
follows. Let (M t ,N t ) be the (random) number of agents who have chosen A by 
the end of period t. In period t+1, an agent chooses A with probability 

where a,p £ [0, 1], and chooses B with probability 1 — P*. If a = this becomes 
a standard Polya's urn process. In the rest of this appendix we will assume 
a > 0. 

We are interested in studying the (random) fraction of agents who have 
chosen A up to period t: 



M t + N t ' 

Conditional on the information at time t, with probability Pi, 



(10) 



AF t+1 =F t+1 -F t = Mt + 1 M^ = A -F L 

t+1 t+1 M t + N t + 1 Mt+Nt t + s + 1' v ' 

while with probability 1 — P tl 

M t M t -F t , 

t+1 ~ Mt + Nt + 1 Mt + Nt ~ t + s + l' ' 

Hence 

Et[AFt+l] = Il^lL = ^I±l. (13 ) 

1 +J t+s+1 t+s+1 K ' 

Separating out the mean term, we can write 

_ as(p - F t ) X t+ i 

+1 _ 7T7TT tTTTT' (14) 

where X t is a martingale difference with the conditional distribution 

x ^ _ |l - Ps with probability P t , 

1 — Pt with probability 1 — Pt. 

Eq. (|14|l falls into the general class of stochastic approximation processes 
that have been studied intensively in the statistics literature [RM51I [PemOl . 
In what follows it is convenient to introduce an auxiliary random variable 

G t =F t -p, (16) 

so that Eq. (|14(l can be written as 

AG t+1 = -f^*" 1 - (17) 
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Proposition 2. EF t — ► p monotonously as t — > oo. 
Proof. From Eq. (|17fl we have 

^-(^i+f+rh+iT&r (18) 

Taking expectation of both sides, we have 

EG « = ('-rpfpr^-n^-s+T+r)* (19) 



fc=0 



Because 

oo 



t— 1 k + s + 1 

k=0 



(20) 



the infinite product in Eq. ^19fl converges to zero monotonously. Thus EGt — ► 
monotonously, or EF t — ► p monotonously. □ 

Lemma 1. Assume a > 0. // 



n-l 

fc=l 

converges, then x n — ► 0. 

Proof. Suppose x„ does not converge to 0. Then without loss of generality we 
can assume that x n > e > infinitely often. It must also be that x n < e/2 
infinitely often, otherwise we would have x„ > e/2 eventually and y n would 
diverge. Because y n is Cauchy, we can find N such that \y m — y n \ < e/4 for all 
m,n > N. Pick n' > max{iV, 2a} such that i„/ < e/2. Pick m > n' such that 
ir m > e. Let n be the largest integer such that n' < n < m and x n < e/2. It is 
clear that m, n chosen this way satisfy the following conditions: 

m > n > N, 

x n < e/2, x m > e, (22) 
n<i<m^Xi> e/2 > 0. 

Now we have 

m — 1 

E%k 
~j _ Urn Vn ^ 
k n 

k—n 

ot el 

^ Urn yn ~t" (^m ^n ) ^ T ~t~ 77 ^n)- ( 2 ^) 

n 4 2 

Hence x m — x n < e/2. But this contradicts with the fact that x n < e/2 and 
x m > e. □ 
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Proposition 3. F t — ► p a.s. as t — ► oo. 

Proof. Replacing i by k and summing up fc = 0, 
martingale: 



, t — 1 in Eq. l(T7|) , we find a 



Gf — G* 



,9o 



t-i 

+£■ 

fe=0 



«G, 



t-i 



1 



k=0 



1 



Furthermore, this martingale is nonnegative. The martingale convergence the- 
orem thus ensures that Gt converges to a (random) limit Goo with probability 
1. From Eq. (|24|) and LemmaQ] the almost sure convergence of Gt implies that 
G( — > almost surely, or Ft — ► p almost surely. □ 

Proposition 4. A higher value of p yields a distribution of F t that stochastically 
dominates a distribution of F t produced by a lower value of p, for allt>0. 

Proof. Suppose p' > p. It suffices to show by induction that for all t, M[ 
stochastically dominates M t , or P[M{. > to] > P[M t > to] for all to e N. The 
statement is correct for t = because m' Q — m . Suppose the statement is 
correct for t, we prove for t + I. 

Case 1. P[M' t = m] > P[M t = m]. 



Case 2. 



P[M' t+l > to] 
= P[M' t > to] + P[M' t = m] 

> P[M t > to] + P[M t = m] 

= P[M t > to]. 
P[M' t = to] < P[M t = m]. 
P[M' t+1 < to] 
P\M[ < to] + P[Mt = 



ap' + (1 — a) 
ap + (1 — a) 



m 



t + s 
m 



t + s 



(25) 



< P[M t <m}+P[M t 
= P[M t <m]. 



a(l -p') + (1 - a 
a(l-p) + (l-a) ( 1 



m 
' t + s 

m 
t + s 



(26) 
□ 



Proposition 5. 

(i) If fo < p, then EF t is increasing in a, for all t > 0. 

(ii) If fo > P, then EF t is decreasing in a, for all t > 0. 
Proof. See Eq. (JTHJl. 



□ 
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Proposition 6. d(var F t ) / da < for all t > iff either 
ft) fo < P and p = (l- a)f + ap > 1/2, or 
(ii) fo>P and p = (1 — a)fo + ap < 1/2. 
Proof. "If" part. Taking variance of both sides of Eq. (|18|) . we have 

var G t+ i= ( 1 - - ) var G t + 1 var (27) 

\ t + s + ij {t + s + iy 

The variance of X t +\ can be calculated as follows. First note that 

Pt = ap+(l-a)F t =p+(l-a)G t . (28) 

Then we can write 

varX t+1 = E{X? +1 ] = E{E t {X? +1 }} = E[P t {\ - P t )] 
= EP t {l-EP t )-Y&xP t 

= EP t (l-EP t )-(l-a) 2 vavG t . (29) 
Plugging this back into Eq. I|27|l , we obtain a recursive relation for var Gt : 
(t + s + 2-2a)(t + s) EPt{l-EP t ) 

varGt+1 = — (tTJ+W 2 — va 4 + (t+s+ir ■ (30) 

where 

BP t =p+(l-a)^G t . (31) 
It is clear from this recursive relation that the conclusion holds if 

gCg^Z^)) < 0. (32) 
oa 

Case (i) By hypothesis go < 0. From Eq. Ijl9(l we see that EGt (being a 
negative sequence) is increasing in both t and a, so is EP t by Eq. Q3ip. Therefore 
E'Pt > po > 1/2. Note that the function /(x) = a;(l — a;) is decreasing in a; 
when x > 1/2. Thus EP t (l — EP t ) is decreasing in a for all 

Case (ii) Similar to (i). 

"Only if" part. Letting t = in Eq. fifift, we have 

varGl = TTiF- (33) 

Taking partial derivative with respect to a, we obtain 

cKvarGi) = (/ - p)(2p Q - 1) 
9a (s + l) 2 



(34) 



Thus, in order to have 9(varFi)/9a < 0, we must have (fo — p)(2po — 1) < 0, 
i.e., either (i) or (ii). □ 
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Proposition 7. 

(i) d(var F t )/dp < for all t > if p > 1/2 and p > 1/2; 
fiij d(v&rF t )/dp > /or all t > if p < 1/2 and p < 1/2. 
fVoo/. We see from Eq. l{3*T)l and Eq. ljlTi|) that 



<9p <9p \ fc + s + 1 



> 0. (35) 



Taking partial derivative of p on both sides of Eq. I|30|l yields 

<9(varG t+ i) _ ft + s + 2 - 2a) ft + s) d(varG t ) 1 - 2£P t g(ffP t ) 
^ ~ (t + s + l) 2 9^ + ft + s + 1) 2 dp ' 



(36) 



The result now follows from the facts that (i) EP t > 1/2 for alH > if p > 1/2 
and po > 1/2, and (ii) £P t < 1/2 for all t > if p < 1/2 and p < 1/2. □ 

Proposition 8. Suppose F[ is a bigger process than F t in that m' Q — krriQ and 
n' Q = kn , where k is an integer larger than one. In all other respects the two 
processes are identical. 

(i) EF[ < EF t for all t > iff f < p. 

(ii) EF[ > EF t for all t > iff f > p. 

Proof. When /o < p we have go < 0. In this case Eq. (|19|) shows that EGt 
is decreasing in s. Hence EF[ < EF t . When / > p we have EF[ > EF t 
similarly. □ 

Proposition 9. All else equal, the distribution of F t is stochastically increasing, 
in a strong sense, in fo, for all t > 0. 

Proof. We only need to show that Mt is stochastically increasing in fo in a 
strong sense. We prove this by induction. The proposition obviously holds 
for t = 1. Suppose t is correct, we prove for t + 1. Let Mt and M[ be the 
two stochastic processes generated by fo and f , assuming that fo < fo- By 
induction hypothesis, there exist s<a<b<c<d<t + s such that P\M[ — 
i] < P[M t = i] for all integers i G [a, 6], P[M' t = i] = P[M t = i] for all integers 
i G [6 + 1, c — 1] (possibly none), and P\M[ = i] > P[M t = i] for all integers 
i G [c,d]. Furthermore, [a,d] covers the support of M t and M[. 
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For all i = c + 1 , . . . , d + 1 , we have 



PM +1 =i] = P[M' t = i-l] 

+ PM = *] 

> P[M t = i-l] 

+ P[M t = i] 
= P[M t+l = i\. 



ap + (1 — a) 



i - 1 



t + s 

a(l-p) + (l-a) ( 1 
i - 1" 



ap + (1 — a)- 



t + s 

a(l-p) + (l-a) ( 1 



t + s 



t + s 



(37) 



Similarly, for alH = a, . . . , b, we have P[M' t+1 — i] < P[M t +i = i\. 

To study the relationship between the probability atoms of M[ , x and M t +i 
at the "boundaries" , distinguish the following two cases: 

Case 1. There are no intermediate "equal" states (b + 1 = c). 

Whatever the relationship between P[M' t+1 = c] and P[Mt+i — c], we have 
M' t+1 strongly dominates M t +\. 

Case 2. There exist some intermediate "equal" states (b + 1 < c). 

It can be checked, in a fashion similar to Eq. I|37() . that (a) P[M' t+1 = 
b + l}< P[M t+1 =6+1], (b) P[M' t+l = c] > P[M t+1 = c], and (c) P[M' t+l = 
i] = P[M t+ i = i] for all i = b + 2, . . . , c — 1 (if any). Hence M' t+1 strongly 
dominates M t+1 . □ 

Proposition 10. <9(var F t )/dfo < for all t > if po > 1/2 and p > 1/2. 

Proof. Using Eq. I|31|l and Eq. I|19|) . it can be calculated that 



1 



(38) 



If Po > 1/2 and p > 1/2, then because EP t converges to p monotonously, we 
have EP t > 1/2 for all t > 0, and therefore 



d(EP t (l-EP t )) 
dfo 



< 



for all t > 0. The result now follows from Eq. 1)30(1. 
Proposition 11. 



(39) 
□ 



(i) If 1/2 < a < 1, then as t -> oo, Vt(F t - p) converges in distribution to 
a normal distribution with mean zero and variance p(l — p)/(2a — 1). In 
particular, var F t = 0(1/ 1). 



(ii) IfO<a< 1/2, then var Ft — + slower than 0(l/t). 
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Proof. To understand this proposition, define Vt = (t + s) var Gt and rewrite 
Eq. (EOJ as 

EPt(l-EP t )-(2a-l)v t 
v t+1 -v t = TTJT - l . (40) 

When t is large, we have 

Av t+1 ~ P(l-p)-(2a-l^ > (41) 

Hence if 2a — 1 > 0, v t is dragged to the limit p(l — p)/(2a — I). 

It is also clear from Eq. I|4(J|) that when < a < 1/2 we have v t — > oo in 
general, so varGt converges slower than 0(l/t). 

A rigorous proof for (i), however, is too technical to be presented here. Var- 
ious authors have given proofs for general stochastic approximation processes. 
For references see |Blu54l IChu54| . □ 



2G 



